102 research outputs found

    Database integrated analytics using R : initial experiences with SQL-Server + R

    Get PDF
    © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Most data scientists use nowadays functional or semi-functional languages like SQL, Scala or R to treat data, obtained directly from databases. Such process requires to fetch data, process it, then store again, and such process tends to be done outside the DB, in often complex data-flows. Recently, database service providers have decided to integrate “R-as-a-Service” in their DB solutions. The analytics engine is called directly from the SQL query tree, and results are returned as part of the same query. Here we show a first taste of such technology by testing the portability of our ALOJA-ML analytics framework, coded in R, to Microsoft SQL-Server 2016, one of the SQL+R solutions released recently. In this work we discuss some data-flow schemes for porting a local DB + analytics engine architecture towards Big Data, focusing specially on the new DB Integrated Analytics approach, and commenting the first experiences in usability and performance obtained from such new services and capabilities.Peer ReviewedPostprint (author's final draft

    ALOJA: A benchmarking and predictive platform for big data performance analysis

    Get PDF
    The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1. This article describes the evolution of the project's focus and research lines from over a year of continuously benchmarking Hadoop under dif- ferent configuration and deployments options, presents results, and dis cusses the motivation both technical and market-based of such changes. During this time, ALOJA's target has evolved from a previous low-level profiling of Hadoop runtime, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. Modeling benchmark executions allow us to estimate the results of new or untested configu- rations or hardware set-ups automatically, by learning techniques from past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span- ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    A model-based approach to assess the effectiveness of pest biocontrol by natural enemies

    Full text link
    Main goal: The aim of this note is to propose a modeling approach for assessing the effectiveness of pest biocontrol by natural enemies in diversified agricultural landscapes including several pesticide-based management strategies. Our approach combines a stochastic landscape model with a spatially-explicit model of population dynamics. It enables us to analyze the effect of the landscape composition (proportion of semi-natural habitat, non-treated crops, slightly treated crops and conventionally treated crops) on the effectiveness of pest biocontrol. Effectiveness is measured through environmental and agronomical descriptors, measuring respectively the impact of the pesticides on the environment and the average agronomic productivity of the whole landscape taking into account losses caused by pests. Conclusions: The effectiveness of the pesticide, the intensity of the treatment and the pest intrinsic growth rate are found to be the main drivers of landscape productivity. The loss in productivity due to a reduced use of pesticide can be partly compensated by biocontrol. However, the model suggests that it is not possible to maintain a constant level of productivity while reducing the use of pesticides, even with highly efficient natural enemies. Fragmentation of the semi-natural habitats and increased crop rotation tend to slightly enhance the effectiveness of biocontrol but have a marginal effect compared to the predation rate by natural enemies. This note was written in the framework of the ANR project PEERLESS "Predictive Ecological Engineering for Landscape Ecosystem Services and Sustainability"(ANR-12-AGRO-0006)

    ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

    Get PDF
    This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

    AUGURES : profit-aware web infrastructure management

    Get PDF
    Over the last decade, advances in technology together with the increasing use of the Internet for everyday tasks, are causing profound changes in end-users, as well as in businesses and technology providers. The widespread adoption of high-speed and ubiquitous Internet access, is also changing the way users interact with Web applications and their expectations in terms of Quality-of-Service (QoS) and User eXperience (UX). Recently, Cloud computing has been rapidly adopted to host and manage Web applications, due to its inherent cost effectiveness and on-demand scaling of infrastructures. However, system administrators still need to make manual decisions about the parameters that affect the business results of their applications ie., setting QoS targets and defining metrics for scaling the number of servers during the day. Therefore, understanding the workload and user behavior ¿the demand, poses new challenges for capacity planning and scalability ¿the supply, and ultimately for the success of a Web site. This thesis contributes to the current state-of-art of Web infrastructure management by providing: i) a methodology for predicting Web session revenue; ii) a methodology to determine high response time effect on sales; and iii) a policy for profit-aware resource management, that relates server capacity, to QoS, and sales. The approach leverages Machine Learning (ML) techniques on custom, real-life datasets from an Ecommerce retailer featuring popular Web applications. Where the experimentation shows how user behavior and server performance models can be built from offline information, to determine how demand and supply relations work as resources are consumed. Producing in this way, economical metrics that are consumed by profit-aware policies, that allow the self-configuration of cloud infrastructures to an optimal number of servers under a variety of conditions. While at the same time, the thesis, provides several insights applicable for improving Autonomic infrastructure management and the profitability of Ecommerce applications.Durante la última década, avances en tecnología junto al incremento de uso de Internet, están causando cambios en los usuarios finales, así como también a las empresas y proveedores de tecnología. La adopción masiva del acceso ubicuo a Internet de alta velocidad, crea cambios en la forma de interacción con las aplicaciones Web y en las expectativas de los usuarios en relación de calidad de servicio (QoS) y experiencia de usuario (UX) ofrecidas. Recientemente, el modelo de computación Cloud ha sido adoptado rápidamente para albergar y gestionar aplicaciones Web, debido a su inherente efectividad en costos y servidores bajo demanda. Sin embargo, los administradores de sistema aún tienen que tomar decisiones manuales con respecto a los parámetros de ejecución que afectan a los resultados de negocio p.ej. definir objetivos de QoS y métricas para escalar en número de servidores. Por estos motivos, entender la carga y el comportamiento de usuario (la demanda), pone nuevos desafíos a la planificación de capacidad y escalabilidad (el suministro), y finalmente el éxito de un sitio Web.Esta tesis contribuye al estado del arte actual en gestión de infraestructuras Web presentado: i) una metodología para predecir los beneficios de una sesión Web; ii) una metodología para determinar el efecto de tiempos de respuesta altos en las ventas; y iii) una política para la gestión de recursos basada en beneficios, al relacionar la capacidad de los servidores, QoS, y ventas. La propuesta se basa en aplicar técnicas Machine Learning (ML) a fuentes de datos de producción de un proveedor de Ecommerce, que ofrece aplicaciones Web populares. Donde los experimentos realizados muestran cómo modelos de comportamiento de usuario y de rendimiento de servidor pueden obtenerse de datos históricos; con el fin de determinar la relación entre la demanda y el suministro, según se utilizan los recursos. Produciendo así, métricas económicas que son luego aplicadas en políticas basadas en beneficios, para permitir la auto-configuración de infraestructuras Cloud a un número adecuado de servidores. Mientras que al mismo tiempo, la tesis provee información relevante para mejorar la gestión de infraestructuras Web de forma autónoma y aumentar los beneficios en aplicaciones de Ecommerce

    The state of SQL-on-Hadoop in the cloud

    Get PDF
    Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud, and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark. The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines. The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    ALOJA-ML: a framework for automating characterization and knowledge discovery in Hadoop deployments

    Get PDF
    This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results. The ALOJA-ML project is the latest phase of a long-term collaboration between BSC and Microsoft, to automate the characterization of cost-effectiveness on Big Data deployments, focusing on Hadoop. Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices. Recently the ALOJA project presented an open, vendor-neutral repository, featuring over 16.000 Hadoop executions. These results are accompanied by a test bed and tools to deploy and evaluate the cost-effectiveness of the different hardware configurations, parameter tunings, and Cloud services. Despite early success within ALOJA from expert-guided benchmarking, it became clear that a genuinely comprehensive study requires automation of modeling procedures to allow a systematic analysis of large and resource-constrained search spaces. ALOJA-ML provides such an automated system allowing knowledge discovery by modeling Hadoop executions from observed benchmarks across a broad set of configuration parameters. The resulting empirically-derived performance models can be used to forecast execution behavior of various workloads; they allow a-priori prediction of the execution times for new configurations and HW choices and they offer a route to model-based anomaly detection. In addition, these models can guide the benchmarking exploration efficiently, by automatically prioritizing candidate future benchmark tests. Insights from ALOJA-ML's models can be used to reduce the operational time on clusters, speed-up the data acquisition and knowledge discovery process, and importantly, reduce running costs. In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 re- search and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR105Peer ReviewedPostprint (published version

    Vibro-acoustic of table tennis rackets, influence of the blade plywood design. Experimental and sensory analyses

    Get PDF
    The performances of a table tennis racket can be qualified with several adjectives like: fast, slow, stiff, adhesive, controllable, etc. These qualifications are subjective since they are relative to the sensory analysis made by each player. It appears that the noise produced at the ball impact on a racket has a great influence on the opinion that a player can give about a racket. Moreover, the sound emitted at the stroke can be appreciated differently among several players. Hence a good sound may give a positive a priori to the player about the racket appreciation. The work presented first demonstrates the correlation between the acoustic frequency spectrum and the vibration frequency spectrum of a racket following the ball impact. The analysis is first performed on the racket blades without rubbers glued on. The vibration modes that produce the sound at the ball impact were identified experimentally and correlated with some numerical predictions. It is shown that two main vibration modes are responsible of the sound emitted. In second, the influence of the blade plywood composition is studied. Several prototype racket blades have been designed with some differences between them on: the thickness of the plies, the wooden essences, the wood fiber orientation. The experimental results obtained permit to clearly state about the effectiveness of these design parameters on the impact sound. Then the influence of the rubbers (coverings) glued on both blade sides is studied. The vibration modes are the same but the frequencies are lower. The sound can be qualified as sharp, long, clear, deep, hollow or plain. In the last part of this study, the experimental observations obtained at the laboratory are compared with the results of a sensory analysis performed with the same prototype rackets by a panel of high level players. It is shown that the classification made by the players is consistent with the experimental observations

    Vocal Culture in the Age of Laryngoscopy

    Get PDF
    For several months beginning in 1884, readers of Life, Science, Health, the Atlantic Monthly and similar magazines would have encountered half-page advertisements for a newly patented medical device called the ‘ammoniaphone’ (Figure 2.1). Invented and promoted by a Scottish doctor named Carter Moffat and endorsed by the soprano Adelina Patti, British Prime Minister William Gladstone and the Princess of Wales, the ammoniaphone promised a miraculous transformation in the voices of its users. It was recommended for ‘vocalists, clergymen, public speakers, parliamentary men, readers, reciters, lecturers, leaders of psalmody, schoolmasters, amateurs, church choirs, barristers, and all persons who have to use their voices professionally, or who desire to greatly improve their speaking or singing tones’. Some estimates indicated that Moffat sold upwards of 30,000 units, yet the ammoniaphone was a flash in the pan as far as such things go, fading from public view after 1886
    corecore